[Intel NPU] Add Windows & Linux Intel NPU support#1171
Conversation
|
This is screenshot of Sabaki testing: And the binary release here: https://github.com/Looong01/KataGo-Multi-backends/releases/tag/v1.16.4-openvino |
|
I partially referenced the code from #1164, and I am very grateful to @ChinChangYang |
|
This is a wonderful work! I will test this backend in a few days. |
|
I will implement AMD NPU backend in days. |
|
Thank you! I am concerned about the poor performance of multi-threading. As shown in the figure, when the number of threads increases, the computation speed actually decreases. Is this because the NPU itself is not suitable for multi-threading, or is it still possible to optimize multi-threading at this stage? |
|
This is amazing on my Linux notebook. I am seeing a 3.5x speedup (87.30 vs 25.16 visits/s) compared to OpenCL, which seems unusually slow on my system. I really appreciate this. As for To build ONNX Runtime, I had to downgrade gcc-15 to gcc-14. Also, the source directories seem different from the document, so I used the following commands in zsh. |
|
Claude detects an issues in a Docker container. Bug: Error message: Root cause:
Reproduction steps: # 1. Clone and checkout this PR branch
git clone https://github.com/lightvector/KataGo.git
cd KataGo
git fetch origin pull/1171/head:pr-1171
git checkout pr-1171
# 2. Download ORT prebuilt + build onnx_proto and protobuf-lite from source
# (ONNXRUNTIME_ROOT = prebuilt ORT package dir)
# (ONNX_INCLUDE_DIR = ort-build/_deps/onnx-build)
# (ONNX_PROTO_LIB = ort-build/_deps/onnx-build/libonnx_proto.a)
# (PROTOBUF_INCLUDE_DIR = ort-build/_deps/protobuf-src/src)
# (PROTOBUF_LIB = ort-build/_deps/protobuf-build/libprotobuf-lite.a)
# 3. Configure
mkdir build && cd build
cmake ../cpp \
-DUSE_BACKEND=ONNX \
-DKATAGO_AUTO_FETCH_DEPS=OFF \
-DONNXRUNTIME_ROOT=<ort-prebuilt-dir> \
-DONNX_INCLUDE_DIR=<ort-build>/_deps/onnx-build \
-DONNX_PROTO_LIB=<ort-build>/_deps/onnx-build/libonnx_proto.a \
-DPROTOBUF_INCLUDE_DIR=<ort-build>/_deps/protobuf-src/src \
-DPROTOBUF_LIB=<ort-build>/_deps/protobuf-build/libprotobuf-lite.a \
-DCMAKE_CXX_FLAGS="-DONNX_ML"
# 4. Build → fails at onnxmodelbuilder.cpp
cmake --build . -j$(nproc)System environment:
Fix: In -#include <onnx/onnx-ml.pb.h>
+#include <onnx/onnx_pb.h>
|
Actually, my CMakeLists.txt deal with it well. I use vcpkg to deal with this deps. Or, do u still think I need to do this change? |
Bcs NPU is different arch(totally different from GPU or CPU), single threading is enough for it. |
I think you misunderstood my comment. The reproduction steps fetch #1171, exact this PR, not mine. |
But I don't meet any error when I compile it. Maybe only happen with GCC-15? |
11433e6 resolves the issue. Thanks. |
This has been working perfectly for the past month. It would be great to have this feature merged into the official KataGo. Without it, I would have almost had to give up on KataGo after moving to my new PC. Thank you again, @Looong01. On the Intel Core Ultra 7 255U, OpenCL KataGo is sadly slow, running at less than half the speed of a 5-year-old system with a Core i7-1165G7. |
|
|
Thanks, I'll also look at this soon. |
Thanks! |
|
Thank you for the updates. b37aa25 works fine with the following minor corrections to the
In my environment, I also needed to downgrade GCC when running At the moment, this is the only branch that runs fast enough for practical use in my environment. I would appreciate official support for this. |
|
@Looong01 I don't know if this is a problem with the original or with OpenVino. |
This is a DEVICE_LOST from the Intel NPU that occurred mid-inference, after roughly 6 hours of self-play (~22,950 games).
The error message itself says "or driver update occurred." Windows Update silently pushes Intel NPU driver updates in the background. If WU updated the NPU driver during the 6-hour run, the device gets re-enumerated and all existing Level Zero contexts/handles are invalidated, causing in-flight inference to fail with device lost. This is ~22,950 games with many inferences each — a very large volume. If the intel_npu plugin or this OpenVINO version leaks memory or handles when repeatedly creating/destroying infer requests, accumulation past some threshold can hang the NPU firmware, triggering a GPU-TDR-style reset. The fact that it crashed after 6 hours rather than immediately is consistent with an accumulation-type issue. A single inference stalls past the watchdog timeout, the NPU is force-reset, and all subsequent command-queue submissions fail. Possible under sustained load, but NPU power draw is low, so this is the least likely. Add auto-restart + error recovery to the self-play loop. This is the most practical fix: after a device lost, the current process generally has to rebuild the ONNX Runtime session (re-initialize the Level Zero context) — simply catching the exception and continuing will likely fail on all subsequent inferences. The most robust approach is to have an outer script detect this error code, kill the process, and relaunch it, resuming from the last SGF/checkpoint. |







Summary
This PR adds and hardens the Windows & Linux Intel NPU path for KataGo using the ONNX backend with ONNX Runtime + OpenVINO Execution Provider, and updates docs/config guidance for an end-to-end workflow.
It also improves failure behavior for non-ONNX builds and simplifies Windows & Linux dependency handling.
What Changed
1) ONNX backend and OpenVINO provider support
onnxProvider(cpu,openvino,cuda,tensorrt,migraphx,coreml).onnxOpenVINODeviceTypeonnxOpenVINODeviceIdonnxOpenVINOCacheDironnxOpenVINOEnableNPUFastCompile(best-effort; depends on ORT build support).onnxmodels directly.bin/.bin.gzmodels via internal conversion to ONNX graph2)
exportonnxcommand behaviorexportonnxis available in ONNX builds and exports fixed-size ONNX models.-x/-ycan override).exportonnxnow returns a clear error instead of failing ambiguously.3) Config safety for non-ONNX binaries
onnx*config keys now fails fast with a clear message.4) CMake dependency flow
ONNXRUNTIME_ROOT(defaulting tocpp/external/onnxruntime-win-x64-openvinoandcpp/external/onnxruntime-linux-x64-openvino).zlib,onnx,protobuf) through vcpkg when enabled.5) Documentation updates
Compiling.md:use_openvino=NPU)cpp/external/onnxruntime-win-x64-openvino.README.md:exportonnx(default 19x19)benchmarkgtpBehavior Notes
onnxDeviceToUseThread*) is mainly intended for ONNX providers like CUDA/TensorRT/MIGraphX.Validation
exportonnxworks from.bin/.bin.gz->.onnx.benchmark/gtprun withonnxProvider=openvinoandonnxOpenVINODeviceType=NPU.